3 Week 3
This session covers two basic areas: (1) downloading census and international data using tidycensus, tigris, and idbr, and (2) mapping and GIS analysis with leaflet, mapview, and sf.
Download the file week03.Rmd and use that as the base. Change the second line in the YAML header so it uses your name and your web site. See UW Students Web Server if you do not have a web site.
Topics
-
tidycensus: Load US Census Boundary and Attribute Data as ‘tidyverse’ and ‘sf’-Ready Data Frames -
tigris: Load Census TIGER/Line Shapefiles -
idbr: R Interface to the US Census Bureau International Data Base API -
leaflet: Create Interactive Web Maps with the JavaScript ‘Leaflet’ Library -
mapview: Interactive Viewing of Spatial Data in R -
sf: Simple Features for R: Simple Features (GIS) for R
3.1 Getting US Census data with tigris, tidycensus
Dealing with US Census data can be overwhelming, particularly if using the raw text-based data. The Census Bureau has an API that allows more streamlined downloads of variables (as data frames) and geographies (as simple format shapes). It is necessary to get an API key, available for free. See tidycensus and tidycensus basic usage, and for a complete treatment, Analyzing US Census Data.
tidycensus uses tigris, which downloads the geographic data portion of the census files.
A simple example will download the variables representing the count of White, Black/African American, American Indian/Native American, and Asian persons from the American Community Survey (ACS) data for King County in 2019.
3.1.1 US Census API key installation
For this example to run, you need to have your US Census API key installed. run this code, but substituting your actual API key for the asterisks.
# set the census API key and the persistent tigris cache location
tidycensus::census_api_key("*****************", install = TRUE)
tigris::tigris_cache_dir("H:/tigris_cache")Your API key has been stored in your .Renviron and can be accessed by Sys.getenv(“CENSUS_API_KEY”).
To use now, restart R or run readRenviron("~/.Renviron")
You should enable the API key:
readRenviron("~/.Renviron")The option install = TRUE writes a line in your ~/.Renviron file (~/ is shorthand for “home directory”; under Windows typically your C:\users\username\Documents folder; under MacOS, typically /home/username). Similarly, tigris_cahce_dir() writes to the ~/.Renviron file. For example my .Renviron file (Figure 1) shows the API key and the persistent tigris cache folder.
Figure 1: Census API key stored in ~/.Renviron
With the API key installed, you can simply load the tidycensus package and download data. When R starts, it reads this file and creates system environment variables for the R session; in this case I’m setting two variables (TIGRIS_CACHE_DIR and CENSUS_API_KEY). You could set other system environment variables to be active in R by adding them to this file.
3.1.2 Census variables
First, it should be noted that not all data products and variables are available for all census geographic units. See Geography and variables in tidycensus for a list of which geographic units are available for different data products using tidycensus.
tidycensus has a helper function for obtaining the lists of variables and their descriptions. This is necessary because the list of variables is quite long and the variable names are codes that are functionally unintelligible.
Census variable lists are obtained using the load_variables() function, which is used to specify the year, data set, and whether or not to cache results. Because the variable lists are large, it may make sense to cache the lists.
Here we will list the variables for the 2019 ACS 5-year average data:
<<<<<<< HEAD
v2019 <- load_variables(year = 2019, dataset = "acs5", cache = TRUE)v2019 <- load_variables(year = 2019, dataset = "acs5", cache = TRUE)The table, which has 27040 records, can then be browsed using the function View(v2019). Using the tabular view in this way makes it convenient to search for variable names or concepts using the filters. For example we can search for the term “race” in the concept field, as shown in (Figure 2).
Figure 2: ACS 5 year variables
However, this interface uses only free text searches and will match any record containing the search string, whether or not the search string is the word alone or the whole word. One can use more specific searches using R syntax, for example to list all variables tagged with the concept “RACE,” filtering by the explicit string is more specific (3.1).
v2019 %>%
filter(concept == "RACE") %>%
kable(caption = 'ACS 5 year variables for the concept "race"') %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left")| name | label | concept |
|---|---|---|
| B02001_001 | Estimate!!Total: | RACE |
| B02001_002 | Estimate!!Total:!!White alone | RACE |
| B02001_003 | Estimate!!Total:!!Black or African American alone | RACE |
| B02001_004 | Estimate!!Total:!!American Indian and Alaska Native alone | RACE |
| B02001_005 | Estimate!!Total:!!Asian alone | RACE |
| B02001_006 | Estimate!!Total:!!Native Hawaiian and Other Pacific Islander alone | RACE |
| B02001_007 | Estimate!!Total:!!Some other race alone | RACE |
| B02001_008 | Estimate!!Total:!!Two or more races: | RACE |
| B02001_009 | Estimate!!Total:!!Two or more races:!!Two races including Some other race | RACE |
| B02001_010 | Estimate!!Total:!!Two or more races:!!Two races excluding Some other race, and three or more races | RACE |
This shows that the variable for the count of all persons is B02001_001 and the variables for White alone, Black alone, American Indian/Alaskan Native alone, Asian alone, and some other race alone are B02001_002, B02001_003, B02001_004, B02001_005, and B02001_007, respectively. One could also use regular expressions to search for desired patterns in the concept.
3.1.3 Downloading data
tidycensus has two main functions, get_decennial() for downloading decennial data and get_acs() for downloading American Community Survey (ACS) data. The functions are quite similar in structure.
Here we will define a set of variables using a named vector of variable names.
# the census variables
census_vars <- c(
p_denom_race = "B02001_001",
p_n_white = "B02001_002",
p_n_afram = "B02001_003",
p_n_aian = "B02001_004",
p_n_asian = "B02001_005"
)The “named vector” refers to the elements of the vector having names, for example, the first element below has the name p_denom_race and the value “B02001_001.” This construction is used so that when the data are downloaded, the resultant data frame will have more readable names.
Next, we will download the actual data. Here we will be downloading census tract level data for King County, WA, from the 5 year ACS estimates for the year ending in 2019. We also specify options(tigris_use_cache=TRUE) so that any shapefile data downloaded are cached and will not be re-downloaded if not necessary. Note in Figure 1 I had specified a location where I wanted my tigris cache to be located (H:/tigris_cache). Another important option is output = "wide", which generates an output table with one record per census unit and columns representing the variables. A tidy output would present a “long” table with repeated records per census unit, one record per variable estimate.
# get the data
ctdat <- get_acs(
geography = "tract",
variables = census_vars,
cache_table = TRUE,
year = 2019,
output = "wide",
state = "WA",
county = "King",
geometry = TRUE,
survey = "acs5"
)##
|
| | 0%
|
|= | 1%
|
|== | 3%
|
|=== | 4%
|
|==== | 5%
|
|==== | 6%
|
|===== | 8%
|
|====== | 8%
|
|======= | 10%
|
|======== | 12%
|
|========= | 13%
|
|========= | 14%
|
|========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============= | 18%
|
|============= | 19%
|
|============== | 20%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 23%
|
|================= | 24%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 29%
|
|===================== | 31%
|
|====================== | 31%
|
|======================= | 33%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 36%
|
|========================== | 36%
|
|========================== | 38%
|
|============================ | 40%
|
|============================= | 42%
|
|============================== | 43%
|
|=============================== | 45%
|
|================================= | 47%
|
|=================================== | 50%
|
|===================================== | 52%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================== | 57%
|
|========================================== | 60%
|
|========================================== | 61%
|
|============================================ | 63%
|
|============================================== | 66%
|
|================================================ | 68%
|
|================================================== | 71%
|
|==================================================== | 74%
|
|===================================================== | 76%
|
|======================================================== | 80%
|
|========================================================= | 82%
|
|=========================================================== | 85%
|
|============================================================= | 87%
|
|=============================================================== | 90%
|
|================================================================ | 92%
|
|================================================================= | 93%
|
|=================================================================== | 95%
|
|==================================================================== | 97%
|
|======================================================================| 100%
<<<<<<< HEAD
ctdat %<>% st_transform(4326)ctdat %<>% st_transform(4326)Getting data from the 2015-2019 5-year ACS
Using FIPS code '53' for state 'WA'
Using FIPS code '033' for 'King County'
A few values are shown in Table 3.2. Because the ACS data include a margin of error (MOE), the estimate is represented with the variable name having the terminal character “E” and the MOE is represented with the variable name having the terminal character “M.” The variables are shown with the names we specified earlier. Without the named vector, the downloaded variables would be given the raw variable names, which are generally not helpful. Note also because of the geometry = TRUE option, there is a geometry column containing geographic data. The “wide” format data are more amenable to applications requiring one record per census unit, because typically other census-unit level data are represented with one record per census unit, which allows for table joins, fro example in mapping applications.
# print a few records
ctdat %>%
head() %>%
kable(caption = "Selected census tract variables from the 5-year ACS from 2019 for King County, WA") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left")| GEOID | NAME | p_denom_raceE | p_denom_raceM | p_n_whiteE | p_n_whiteM | p_n_aframE | p_n_aframM | p_n_aianE | p_n_aianM | p_n_asianE | p_n_asianM | geometry |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 53033011300 | Census Tract 113, King County, Washington | 6656 | 447 | 3412 | 323 | 480 | 209 | 133 | 100 | 880 | 409 | MULTIPOLYGON (((-122.3551 4… |
| 53033004900 | Census Tract 49, King County, Washington | 7489 | 605 | 6469 | 654 | 15 | 25 | 18 | 24 | 520 | 225 | MULTIPOLYGON (((-122.3555 4… |
| 53033026801 | Census Tract 268.01, King County, Washington | 6056 | 642 | 2561 | 615 | 542 | 426 | 184 | 162 | 777 | 378 | MULTIPOLYGON (((-122.3551 4… |
| 53033006400 | Census Tract 64, King County, Washington | 3739 | 192 | 3101 | 231 | 62 | 45 | 38 | 35 | 231 | 115 | MULTIPOLYGON (((-122.3126 4… |
| 53033005100 | Census Tract 51, King County, Washington | 3687 | 236 | 3066 | 230 | 116 | 135 | 8 | 14 | 228 | 58 | MULTIPOLYGON (((-122.3364 4… |
| 53033002000 | Census Tract 20, King County, Washington | 3854 | 271 | 3129 | 290 | 54 | 76 | 9 | 13 | 431 | 139 | MULTIPOLYGON (((-122.3177 4… |
An example of data downloaded in “long” format is shown in 3.3. Here, the column variable indicates what measure is stored in the record, and estimate and moe present the data values for that particular variable \(\times\) census unit combination. Data in this format are more amenable to generating data summaries, rather than applications such as mapping.
# get the data
ctdatlong <- get_acs(
geography = "tract",
variables = census_vars,
cache_table = TRUE,
year = 2019,
output = "tidy",
state = "WA",
county = "King",
geometry = TRUE,
survey = "acs5"
)## Getting data from the 2015-2019 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
ctdatlong %>%
head() %>%
kable(caption = "Selected census tract variables from the 5-year ACS from 2019 for King County, WA (long format)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left")| GEOID | NAME | variable | estimate | moe | geometry |
|---|---|---|---|---|---|
| 53033011300 | Census Tract 113, King County, Washington | p_denom_race | 6656 | 447 | MULTIPOLYGON (((-122.3551 4… |
| 53033011300 | Census Tract 113, King County, Washington | p_n_white | 3412 | 323 | MULTIPOLYGON (((-122.3551 4… |
| 53033011300 | Census Tract 113, King County, Washington | p_n_afram | 480 | 209 | MULTIPOLYGON (((-122.3551 4… |
| 53033011300 | Census Tract 113, King County, Washington | p_n_aian | 133 | 100 | MULTIPOLYGON (((-122.3551 4… |
| 53033011300 | Census Tract 113, King County, Washington | p_n_asian | 880 | 409 | MULTIPOLYGON (((-122.3551 4… |
| 53033004900 | Census Tract 49, King County, Washington | p_denom_race | 7489 | 605 | MULTIPOLYGON (((-122.3555 4… |


